Phoneme-Based Transliteration of Foreign Names for OOV Problem
نویسندگان
چکیده
One problem seriously affecting CLIR performance is the processing of queries with embedded foreign names. A proper noun dictionary is never complete rendering name translation from English to Chinese ineffective. One way to solve this problem is not to rely on a dictionary alone but to adopt automatic translation according to pronunciation similarities, i.e. to map phonemes comprising an English name to sound units (e.g. pinyin) of the corresponding Chinese name. This process is called transliteration. We present a statistical transliteration method for CLIR applications. An efficient algorithm for phoneme alignment is described. Unlike traditional rule-based approaches, our method is data-driven. So it is independent of dialect features in Chinese. In addition, it is different from other statistical approaches based on source-channel framework in that we adopt a direct transliteration model, i.e. the direction of probabilistic estimation is consistent with transliteration direction. We demonstrate comparable performance on accuracy to other systems.
منابع مشابه
Phoneme-based Statistical Transliteration of Foreign Names for OOV Problem
Given a source language term, machine transliteration is to automatically generate the phonetic equivalents in a target language. It is useful in many cross language applications. Recently, there are increasing concerns about automatic transliteration, especially with languages with significant distinctions in their phonetic representations, e.g. English and Chinese. Despite many cross-language...
متن کاملLanguage Independent Transliteration System Using Phrase-based SMT Approach on Substrings
Everyday the newswire introduce events from all over the world, highlighting new names of persons, locations and organizations with different origins. These names appear as Out of Vocabulary (OOV) words for Machine translation, cross lingual information retrieval, and many other NLP applications. One way to deal with OOV words is to transliterate the unknown words, that is, to render them in th...
متن کاملExtracting English-Korean Transliteration Equivalence from Domain-Specific Dictionaries
Automatic translation knowledge acquisition or automatic bilingual dictionary construction has become an important first step for natural language applications such as machine translation and cross-language information retrieval. Transliterations are used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from E...
متن کاملLearning to Find Transliteration on the Web
This prototype demonstrate a novel method for learning to find transliterations of proper nouns on the Web based on query expansion aimed at maximizing the probability of retrieving transliterations from existing search engines. Since the method we used involves learning the morphological relationships between names and their transliterations, we refer to this IR-based approach as morphological...
متن کاملOptimizing Transliteration for Hindi/Marathi to English Using only Two Weights
Machine transliteration has received significant research attention in last two decades. It is observed that Hindi to English and Marathi to English named entity machine transliteration is comparably less studied. Currently, research work in this domain is carried out by using grapheme based statistical approaches. But, to achieve better accuracy for the transliteration, an adequate bilingual t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004